Accelerated Parametric Chamfer Alignment Using a Parallel, Pipelined GPU Realization
ثبت نشده
چکیده
Parametric chamfer alignment (PChA) is commonly employed for aligning an observed set of points with a corresponding set of reference points. PChA estimates optimal geometric transformation parameters that minimize an objective function formulated as the sum of the squared distances from each transformed observed point to its closest reference point. A distance transform enables efficient computation of the (squared) distances and the objective function minimization is commonly performed via the Levenberg-Marquardt (LM) non-linear least squares iterative optimization algorithm. The pointwise computations of the objective function, gradient, and Hessian approximation required for the LM iterations make PChA computationally demanding for large scale data sets. We propose an acceleration of the PChA via a parallelized and pipelined realization that is particularly wellsuited for large scale data sets and for modern GPU architectures. Specifically, we partition the observed points among the GPU blocks and decompose the expensive LM calculations in correspondence with the GPU’s single instruction multiple thread (SIMT) architecture to significantly speed-up this bottleneck step for PChA on large-scale datasets. Additionally, by re-ordering computations, we propose a novel pipelining of the LM algorithm that offers further speed-up by exploiting the low arithmetic latency of the GPU compared with its high global memory access latency. Results obtained on two different platforms for both 2D and 3D large-scale point data sets from our ongoing research demonstrate that the proposed PChA GPU implementation provides a significant speed-up over its single CPU counterpart.
منابع مشابه
Efficient implementation of low time complexity and pipelined bit-parallel polynomial basis multiplier over binary finite fields
This paper presents two efficient implementations of fast and pipelined bit-parallel polynomial basis multipliers over GF (2m) by irreducible pentanomials and trinomials. The architecture of the first multiplier is based on a parallel and independent computation of powers of the polynomial variable. In the second structure only even powers of the polynomial variable are used. The par...
متن کاملImplementation of the direction of arrival estimation algorithms by means of GPU-parallel processing in the Kuda environment (Research Article)
Direction-of-arrival (DOA) estimation of audio signals is critical in different areas, including electronic war, sonar, etc. The beamforming methods like Minimum Variance Distortionless Response (MVDR), Delay-and-Sum (DAS), and subspace-based Multiple Signal Classification (MUSIC) are the most known DOA estimation techniques. The mentioned methods have high computational complexity. Hence using...
متن کاملDesign and Implementation of Digital Demodulator for Frequency Modulated CW Radar (RESEARCH NOTE)
Radar Signal Processing has been an interesting area of research for realization of programmable digital signal processor using VLSI design techniques. Digital Signal Processing (DSP) algorithms have been an integral design methodology for implementation of high speed application specific real-time systems especially for high resolution radar. CORDIC algorithm, in recent times, is turned out to...
متن کاملComputing resultants on Graphics Processing Units: Towards GPU-accelerated computer algebra
In this article we report on our experience in computing resultants of bivariate polynomials on Graphics Processing Units (GPU). Following the outline of Collins’ modular approach [6], our algorithm starts by mapping the input polynomials to a finite field for sufficiently many primes m. Next, the GPU algorithm evaluates the polynomials at a number of fixed points x ∈ Zm, and computes a set of ...
متن کاملAcceleration of the Smith-Waterman algorithm using single and multiple graphics processors
Finding regions of similarity between two very long data streams is a computationally intensive problem referred to as sequence alignment. Alignment algorithms must allow for imperfect sequence matching with different starting locations and some gaps and errors between the two data sequences. Perhaps the most well known application of sequence matching is the testing of DNA or protein sequences...
متن کامل